-
Notifications
You must be signed in to change notification settings - Fork 3.7k
Enable CUDA provider option configuration for C# #7315
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
| public static SessionOptions MakeSessionOptionWithCudaProvider(OrtCUDAProviderOptions cuda_options) | ||
| { | ||
| CheckCudaExecutionProviderDLLs(); | ||
| SessionOptions options = new SessionOptions(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
options [](start = 27, length = 7)
SessionOptions is a IDisposable. What happens to this instance when either of the below native methods return an error VerifySuccess throws?
What happens to cuda_options_native handle when the second call to native method throws? #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Put native methods inside using-block. Since SessionOptions is an IDisposable, dispose function will be called once errors/exceptions happen.
BTW, I found out that our SessionOptions class didn't contain dispose function. Just added.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The implementation of SessionOptions is correct. You do not need to change it. #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've undo my change. #Resolved
| CheckCudaExecutionProviderDLLs(); | ||
| SessionOptions options = new SessionOptions(); | ||
|
|
||
| OrtCUDAProviderOptionsNative cuda_options_native; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cuda_options_native [](start = 41, length = 19)
Where is the code that disposes of cuda_options_native? #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cuda_options_native is managed object, so it will be reclaimed by gc. Correct me if I'm wrong. #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure it is. I suggest renaming the class so it does not include native in its name.
In reply to: 612567480 [](ancestors = 612567480)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Renamed it to avoid confusion. #Resolved
| /// Overrides SafeHandle.IsInvalid | ||
| /// </summary> | ||
| /// <value>returns true if handle is equal to Zero</value> | ||
| public override bool IsInvalid { get { return handle == IntPtr.Zero; } } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IsInvalid [](start = 29, length = 9)
My understanding is that handle is never initialized and therefore is never valid. Is this ok? #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In constructor, IntPtr.Zero has been assigned to handle.
Therefore "handle == InPtr.Zero" will return true.
https://docs.microsoft.com/en-us/dotnet/api/system.runtime.interopservices.safehandle.-ctor?view=net-5.0
https://stackoverflow.com/questions/17645216/whats-the-purpose-of-invalidhandlevalue-in-safehandle #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct. I will always be invalid. What is the purpose of this class being SafeHandle?
In reply to: 612550263 [](ancestors = 612550263)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For future use. But agree with you that it's not needed to inherit from SafeHandle. Already made this class simple static class. #Resolved
| /// <summary> | ||
| /// Holds provider options configuration for creating an InferenceSession. | ||
| /// </summary> | ||
| public class ProviderOptions : SafeHandle |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SafeHandle [](start = 35, length = 10)
You made this class IDisposable while the code does not have a real handle. My understanding that you did it for the future. However, this means, that even now it must be disposed.
Even if it currently does not have a real handle to dispose, in the future, it might have it, but the code will set a bad example of not disposing it and will leak. #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for reminding the importance of presence of dispose function.
I've added dispose function in this class, even though it currently doesn't have a real handle as you found. #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you need to read on how to implement IDisposable and what SafeHandle is.
- What is it that you are disposing? Handle is never initialized to any meaningful value and it is not clear what it represents. Your Dispose() simply sets it to zero and does nothing.
- SafeHandle already implements Dispose() and it calls ReleaseHandle() so the imlpementor of the class does the custom thing that they need to do. In this case your Dispose() overrides SafeHandle implementation and prevents ReleaseHandle() to be ever called. If it was it would be a never ending recursion.
- Your handle always has a zero value so SafeHandle would never calls ReleaseHandle() because it thinks it is already disposed.
This class as far as I can see should not inherit from SafeHandle.
In reply to: 612546105 [](ancestors = 612546105)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason I made this class a derived class from SafeHandle is for future use where the class needs to call ort native api and should be released once it's done.
Agree with you that so far this class doesn't need to inherit from SafeHandle. For simplicity, I made this class a simple static class.
BTW, just curious regarding second point you mentioned that it would be a never ending recursion. Since ReleaseHandle will never be called, and my implementation of Dispose() will be called once, how can recursion happen? #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should not override Dispose(). If that is removed and ReleaseHandle() continues to call Dispose() it would recurse.
In reply to: 613739660 [](ancestors = 613739660)
|
This PR is critical for working around cudnn algorithm selection bug: #7212. Hope it gets in soon :) |
| { | ||
| NativeApiStatus.VerifySuccess(NativeMethods.SessionOptionsAppendExecutionProvider_CUDA(options.Handle, ref cuda_options_native)); | ||
| NativeApiStatus.VerifySuccess(NativeMethods.OrtSessionOptionsAppendExecutionProvider_CPU(options.Handle, 1)); | ||
| return options; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return options; [](start = 16, length = 15)
So using() clause would dispose of SessionOptions and you would return a disposed (dead) object, correct? #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right. I'm being cursory. I've modified this function back to its original state.
To answer your previous question - What happens to this instance when either of the below native methods return an error VerifySuccess throws?
https://docs.microsoft.com/en-us/dotnet/api/system.runtime.interopservices.safehandle.releasehandle?view=net-5.0
My understanding is that even if VerifySuccess throws exception, still this instance will be handled by garbage collector. Due to this instance is SafeHandle, gc will call ReleaseHandle after normal finalizers have been run for objects that were garbage collected at the same time. Therefore, SessionOptions instance will be reclaimed.
I'm also thinking about using following code:
SessionOptions options = new SessionOptions();
try {
NativeApiStatus.VerifySuccess(NativeMethods.OrtSessionOptionsAppendExecutionProvider_ROCM(options.Handle, deviceId));
NativeApiStatus.VerifySuccess(NativeMethods.OrtSessionOptionsAppendExecutionProvider_CPU(options.Handle, 1));
}
catch (Exception e) {
options.Dispose();
}
but I think it's not necessary. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would be a correct imlpementation. You will need to add rethrow to the exception. Before rethrowing you will need to make sure that options do not leak native resources.
You want to return options only when everything succeeds.
GC may never be called because it does not know we are running out of native memory. The same applies to file handles, locks and other native and non-managed memory resources. GC does not know anything about those things. All it knows is managed memory. These are completely independent things.
Even if GC is called (pure luck), it does not mean finalizers are called bc of the variety of reasons, there are multiple generations of garbage etc. GC should kick in first and then the finalizer and it may never happen before you run out resources and because managed memory may be OK at the same time. That's why C++ has destructors, but C# had IDisposable()
In reply to: 613934184 [](ancestors = 613934184)
| handle = IntPtr.Zero; | ||
|
|
||
| disposed_ = true; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SafeHandle already implements Dispose() and it calls ReleaseHandle below(). Please, remove this code.
Implementing IDisposable() is not for the faint of heart and one is advised to read the documentation first. #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed. #ByDesign
| /// <summary> | ||
| /// Provider options for CUDA EP. | ||
| /// </summary> | ||
| public struct OrtCUDAProviderOptions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OrtCUDAProviderOptions [](start = 18, length = 22)
This seems to be a duplicate class. Have you considered to use the same class everywhere? #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Made only one OrtCUDAProviderOptions exist. #Resolved
| [StructLayout(LayoutKind.Sequential)] | ||
| public struct OrtCUDAProviderOptionsNative | ||
| { | ||
| public int device_id; // cuda device with id=0 as default device. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
int [](start = 15, length = 3)
I would want to verify that this is being marshalled correctly. In C# int is always 4 bytes. In the native code on Windows it is 4 bytes but on Linux it is 8 bytes. So it might work OK on windows but we need it work everywhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No. Even on Linux int is still 4 bytes. You were talking 'long' #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We may have an issue here. The new C API was added without giving a though how this is going to interact with other languages even though the purpose of having C API is language binding. So you will need to verify the fact in the native debugger that the marshalling occurs correctly and make the test test.
In reply to: 613468046 [](ancestors = 613468046)
| { | ||
|
|
||
| OrtCUDAProviderOptions cuda_options = ProviderOptions.GetDefaultCUDAProviderOptions(); | ||
| using (var sessionOptions = new SessionOptions()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This does not test anything. Let's find a way to verify that this indeed happened
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW, currently I forgot to pass USE_CUDA in.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can see "Microsoft.ML.OnnxRuntime.Tests.InferenceTest.TestGpu" was skipped in your test run.
|
This issue has been automatically marked as stale due to inactivity and will be closed in 7 days if no further activity occurs. If further support is needed, please provide an update and/or more details. |
Description:
Support for configuring a CUDA EP instance via C#.
This PR is the continuation/modification of #6291
Motivation and Context